A compiler approach for exploiting partial SIMD parallelism

机译：利用部分SIMD并行性的编译器方法

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Existing vectorization techniques are ineffective for loops that exhibit little loop-level parallelism but some limited superword-level parallelism (SLP). We show that effectively vectorizing such loops requires partial vector operations to be executed correctly and efficiently, where the degree of partial SIMD parallelism is smaller than the SIMD datapath width. We present a simple yet effective SLP compiler technique called PAVER (PArtial VEctorizeR), formulated and implemented in LLVM as a generalization of the traditional SLP algorithm, to optimize such partially vectorizable loops. The key idea is to maximize SIMD utilization by widening vector instructions used while minimizing the overheads caused by memory access, packing/ unpacking, and/or masking operations, without introducing new memory errors or new numeric exceptions. For a set of 9 C/C++/Fortran applications with partial SIMD parallelism, PAVER achieves significantly better kernel and whole-program speedups than LLVM on both Intel's AVX and ARM's NEON.

机译：现有的矢量化技术对于显示出很少的循环级并行性但有限的超字级并行性（SLP）的循环无效。我们表明，有效地矢量化此类循环需要正确且有效地执行部分矢量运算，其中部分SIMD并行度小于SIMD数据路径宽度。我们提出了一种简单而有效的SLP编译器技术，称为PAVER（PArtial VEctorizeR），在LLVM中作为传统SLP算法的泛化形式制定和实施，以优化这种部分可矢量化的循环。关键思想是通过扩展使用的向量指令来最大化SIMD利用率，同时最大程度地减少由内存访问，打包/拆包和/或屏蔽操作导致的开销，而不会引入新的内存错误或新的数字异常。对于一组9个具有部分SIMD并行性的C / C ++ / Fortran应用程序，PAVER与Intel的AVX和ARM的NEON上的LLVM相比，可实现比LLVM更好的内核和整个程序加速。

著录项

作者
Zhou, H; Xue, J;
展开▼
作者单位

展开▼
年度 2016
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. A Compiler Approach for Exploiting Partial SIMD Parallelism [J] . Zhou Hao, Xue Jingling ACM Transactions on Architecture and Code Optimization . 2016,第1期

机译：利用部分SIMD并行性的编译器方法
2. Exploiting SIMD parallelism on dynamically partitioned parallel network coding for P2P systems [J] . Deokho Kim, Karam Park, Won W. Ro Computers and Electrical Engineering . 2013,第1期

机译：在P2P系统的动态分区并行网络编码中利用SIMD并行性
3. Exploiting parallelism in geometry processing with general purpose processors and floating-point SIMD instructions [J] . Chia-Lin Yang, Sano B. IEEE Transactions on Computers . 2000,第9期

机译：利用通用处理器和浮点SIMD指令在几何处理中利用并行性
4. Exploiting SIMD Parallelism with the CGIS Compiler Framework [C] . Nicolas Fritz, Philipp Lucas, Reinhard Wilhelm International Workshop on Languages and Compilers for Parallel Computing . 2008

机译：利用CGIS编译器框架利用SIMD并行性
5. Exploiting Thread-Level Parallelism on Reconfigurable Architectures: a Cross-Layer Approach [D] . Momeni, Amir. 2017

机译：在可重构体系结构上利用线程级并行性：一种跨层方法
6. Rubus: A compiler for seamless and extensible parallelism [O] . Muhammad Adnan, Faisal Aslam, Zubair Nawaz, 2011

机译：Rubus：无缝和可扩展并行性的编译器
7. Compiler techniques for improving SIMD parallelism [O] . Zhou Hao Computer Science Engineering Faculty of Engineering UNSW 2016

机译：改善SIMD并行性的编译器技术
8. Exploiting Parallelism in Geometry Processing with General Purpose Processors and Floating-Point SIMD Instructions. [R] . Yang, C., Sano, B., Lebeck, A. R. 2005

机译：利用通用处理器和浮点sImD指令开发几何处理中的并行性。

A compiler approach for exploiting partial SIMD parallelism

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅